The VERICLIG Project: Extraction of Computer Interpretable Guidelines via Syntactic and Semantic Annotation

نویسندگان

  • Camilo Thorne
  • Marco Montali
  • Diego Calvanese
  • Elena Cardillo
  • Claudio Eccher
چکیده

We consider the problem of extracting formal process representations of the therapies defined by clinical guidelines, viz., computer interpretable guidelines (CIGs), based on UMLS and semantic and syntactic annotation. CIGs enable the application of formal methods (such as model checking, verification, conformance assessment) to the clinical domain. We argue that, while minimally structured, correspondences among clinical guideline syntax and discourse relations and clinical process constructs should however be exploited to successfully extract CIGs. We review work on current clinical syntactic and semantic annotation, pinpointing their limitations, and discuss a CIG extraction methodology based on recent efforts on business process modelling notation (BPMN) model extraction from natural language text. 1 Problem Description Clinical guidelines are evidence-based documents compiling the best practices for the treatment of an illness or medical condition (e.g., lung cancer, flu or diabetes): they are regarded, following Shahar et al. (2004), as a major tool in improving the quality of medical care. More concretely, they describe or define the “ideal” (most successful) care plans or therapies healthcare professionals should follow when treating an “ideal” (i.e., average) patient for a given illness. Being general, guidelines need to be modified or instantiated relatively to available resources by health institutions, patients or doctors into protocols, and implemented thereafter into clinical workflows or careflows within clinical information systems. An important intermediate step for the synthesis of protocols and careflows from guidelines are computer interpretable guidelines (CIGs), viz., formal representations of the main control flow features of the described treatment and of its process or plan structure. CIGs can be exploited in a plethora of ways by clinical decision support systems to provide execution support and recommendations to the involved practitioners, guide the refinement into executable clinical protocols and careflows, and check for conformance and compliance. Clinical document processing, and in particular the authoring of CIGs, protocols and careflows, is however a very costly and error prone task as it involves many layers of manual processing and annotation by experts. This explosion in costs, as pinpointed by Goth (2012), raises the need to develop biomedical NLP techniques, specifically: (1) clinical information extraction (IE) techniques and (2) automated CIG extraction methodologies. The VERICLIG project1, a joint project involving the KRDB Research Centre for Knowledge and Data (Faculty of Computer Science, Free-University of Bozen-Bolzano) and the eHealth group from the Fondazione Bruno Kessler (Trento), intends to address the research problem (2) by adopting a computational semantics approach that aims at extracting CIGs from textual clinical guidelines. Our objective is to extract the main control-flow structures emerging from the textual description of guidelines in order to explore, in a second step, the possibility to express them using well-known representation languages. http://www.inf.unibz.it/ ̃cathorne/vericlig 1.5.1.2 Emphasise advice on healthy balanced eating that is applicable to the general population when providing advice to people with type 2 diabetes. 1.5.1.3 Continue with metformin if blood glucose control remains inadequate and another oral glucoselowering medication is added. Figure 1: An excerpt from the NICE diabetes-2 clinical guideline2. Each line describes atomic treatments that combine together into a complex therapy. One such language is the business processing modeling notation (BPMN) standard (see Ko et al. (2009)). Process specification and representation languages allow to leverage on formal methods (verification, model checking) as in Hommenrsom et al. (2008), which are useful for reasoning about the extracted CIGs and relate them with the corresponding executed clinical process. To realize our objective, we build on the work on clinical semantic and syntactic annotation mentioned above as well as on recent efforts on BPMN model extraction by Friederich et al. (2011). 2 Clinical Guidelines and Processes Clinical guidelines such as, for instance, guidelines related to chronic diseases such as diabetes, allergies or lactose intolerance, are minimally structured documents. They possess however some crucial features: (1) they describe a process, generically intended as a set of coordinated activities, structured over time, to jointly reach a certain goal, and (2) the structure of the process they describe is significantly reflected by English syntax and vocabulary. Processes. There are several ways to formally characterize processes, but little consensus as to which is the most appropriate for therapies. Thus, we do not intend at this stage to commit ourselves in the VERICLIG project to a particular formalism, but intend rather to focus on the main features such formalisms share, and in particular on their most basic, common constructs. For convenience, we use the terminology coming from the BPMN standard. In BPMN a process is a complex object constituted by the following basic components: (i) activities (e.g., providing advice, controlling blood glucose levels), representing units of execution in the process; (ii) participants, viz., the actors (e.g., doctors, nurses, patients), represented using pools, which are independent, autonomous points of execution, and possibly lanes, detailing participants belonging to the same pool; (iii) artifacts or resources (e.g., metmorfin) used or consumed by activities; (iv) control flows and gates (e.g., “if. . . then. . . else” control structures) that specify the acceptable orderings among activities inside a pool; (v) message flows, representing information exchange between activities and participants belonging to different pools. Process-evoking Categories. In English, content words provide the vocabulary of the domain, denoting the objects, sets and (non-logical) relations that hold therein; their meaning (denotation) is static. On the other hand, function words denote the logical constraints, relationships and operations holding over such sets and relations. This distinction holds also to some degree (as allowed by their inherent ambiguity) in clinical domain documents, giving way to process-evoking word categories (and constituents). Figure 1 provides an excerpt taken from a diabetes guideline. In it, activities, actors and artifacts/resources (i.e., static information) are denoted by content words. Activities are denoted often by transitive, intransitive or ditransitive verbs, viz., VBs3 and VBZs, participles (VBNs), gerunds (VBGs), http://www.nice.org.uk/nicemedia/pdf/CG87NICEGuideline.pdf In what follows we refer to Penn Treebank word category and syntactic constituent tags, see Marcus et al. (1993). Continue with metformin if blood glucose control remains ⇓ ⇓ ⇓ ⇓ reg. activity pharm. substance laboratory procedure ql. concept inadequate and another oral glucose-lowering medication is added . ⇓ ⇓ ⇓ ql. concept therapeutic procedure fc. concept (S (VP (VB Continue) (PP (IN with) (NP (NN metformin))) (SBAR (IN if) (S (S (NP (NN blood) (NN glucose) (NN control)) (VP (VBZ remains)(ADJP (JJ inadequate)))) (CC and) (S (NP (DT another) (JJ oral) (JJ glucose-lowering) (NN medication)) (VP (VBZ is) (VP (VBN added)))))))) Legend

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supporting the Abstraction of Clinical Practice Guidelines Using Information Extraction

Modelling clinical practice guidelines in a computer-interpretable format is a challenging and complex task. The modelling process involves both medical experts and computer scientists, who have to interact and communicate together. In order to support both modeller groups we propose to provide them with helpful information automatically generated using NLP methods. We identify this information...

متن کامل

Annotation of Semantic Relations in Patent Documents

This paper presents the theoretical bases and quantitative results of an activity consisting in manually annotating part-whole and motion relations in patent documents. The aim of this activity was creating a gold standard for the evaluation of an automatic relation extraction tool developed by FBK-irst within the PATExpert project. For this purpose, we took the annotation scheme created for th...

متن کامل

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013